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Abstract 

The quantum Fourier transform (QFT), a quantum analog of the classical Fourier transform, 
has been shown to be a powerful tool in developing quantum algorithms. However, in classical 
computing there is another class of unitary transforms, the wavelet transforms, which are every 
bit as useful as the Fourier transform. Wavelet transforms are used to expose the multi-scale 
structure of a signal and are likely to be useful for quantum image processing and quantum data 
compression. In this paper, we derive efficient, complete, quantum circuits for two representative 
quantum wavelet transforms, the quantum Haar and quantum Daubechies transforms. Our 
approach is to factor the classical operators for these transforms into direct sums, direct products 
and dot products of unitary matrices. In so doing, we find that permutation matrices, a partic- 
ular class of unitary matrices, play a pivotal role. Surprisingly, we find that operations that are 
easy and inexpensive to implement classically are not always easy and inexpensive to implement 
quantum mechanically, and vice versa. In particular, the computational cost of performing cer- 
tain permutation matrices is ignored classically because they can be avoided explicitly. However, 
quantum mechanically, these permutation operations must be performed explicitly and hence their 
cost enters into the full complexity measure of the quantum transform. We consider the particular 
set of permutation matrices arising in quantum wavelet transforms and develop efficient quantum 
circuits that implement them. This allows us to design efficient, complete quantum circuits for the 
quantum wavelet transform. 
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1 Introduction 

The field of quantum computing has undergone an explosion of activity over the past few years. 
Several important quantum algorithms are now known. Moreover, prototypical quantum computers 
have already been built using nuclear magnetic resonance [1, 2] and nonlinear optics technologies 
[3]. Such devices are far from being general-purpose computers. Nevertheless, they constitute 

1 Presented at 1st NASA Int. Conf. on Quantum Computing and Communication, Palm Spring, CA, Feb. 17-21, 
1998. 
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significant milestones along the road to practical quantum computing. 

A quantum computer is a physical device whose natural evolution over time can be interpreted 
as the execution of a useful computation. The basic element of a quantum computer is the quantum 
bit or " qubit" , implemented physically as the state of some convenient 2-state quantum system such 
as the spin of an electron. Whereas a classical bit must be either a or a 1 at any instant, a qubit 
is allowed to be an arbitrary superposition of a and a 1 simultaneously. To make a quantum 
memory register we simply consider the simultaneous state of (possibly entangled) tuples of qubits. 

The state of a quantum memory register, or any other isolated quantum system, evolves in time 
according to some unitary operator. Hence, if the evolved state of a quantum memory register is 
interpreted as having implemented some computation, that computation must be describable as 
a unitary operator. If the quantum memory register consists of n qubits, this operator will be 
represented, mathematically, as some 2™ x 2 n dimensional unitary matrix. 

Several quantum algorithms are now known, the most famous examples being Deutsch and 
Jozsa's algorithm for deciding whether a function is even or balanced [4], Shor's algorithm for 
factoring a composite integer [5] and Grover's algorithm for finding an item in an unstructured 
database [6]. However, the field is growing rapidly and new quantum algorithms are being discovered 
every year. Some recent examples include Brassard, Hoyer, and Tapp's quantum algorithm for 
counting the number of solutions to a problem [7] , Cerf , Grover and Williams quantum algorithm 
for solving NP-complete problems by nesting one quantum search within another [8] and van Dam, 
Hoyer, and Tapp's algorithm for distributed quantum computing [9]. 

The fact that quantum algorithms are describable in terms of unitary transformations is both 
good news and bad for quantum computing. The good news is that knowing that a quantum 
computer must perform a unitary transformation allows theorems to be proved about the tasks that 
quantum computers can and cannot do. For example, Zalka has proved that Grover's algorithm is 
optimal [10]. Aharonov, Kitaev, and Nisan have proved that a quantum algorithm that involves 
intermediate measurements is no more powerful than one that postpones all measurements until 
the end of the unitary evolution stage [11]. Both these proofs rely upon quantum algorithms being 
unitary transformations. On the other hand, the bad news is that many computations that we 
would like to perform are not originally described in terms of unitary operators. For example, 
a desired computation might be nonlinear, irreversible or both nonlinear and irreversible. As a 
unitary transformation must be linear and reversible we might need to be quite creative in encoding 
a desired computation on a quantum computer. Irreversibility can be handled by incorporating 
extra "ancilla" qubits that permit us to remember the input corresponding to each output. But 
nonlinear transformations are still problematic. 

Fortunately, there is an important class of computations, the unitary transforms, such as the 
Fourier transform, Walsh-Hadamard transform and assorted wavelet transforms, that are describ- 
able, naturally, in terms of unitary operators. Of these, the Fourier and Walsh-Hadamard trans- 
forms have been the ones studied most extensively by the quantum computing community. In fact, 
the quantum Fourier transform (QFT) is now recognized as being pivotal in many known quantum 
algorithms [12]. The quantum Walsh-Hadamard transform is a critical component of both Shor's 
algorithm [5] and Grover's algorithm [6]. However, the wavelet transforms are every bit as useful 
as the Fourier transform, at least in the context of classical computing. For example, wavelet 
transforms are particularly suited to exposing the multi-scale structure of a signal. They are likely 
to be useful for quantum image processing and quantum data compression. It is natural therefore 
to consider how to achieve a quantum wavelet transform. 



2 



Starting with the unitary operator for the wavelet transform, the next step in the process of 
finding a quantum circuit that implements it, is to factor the wavelet operator into the direct 
sum, direct product and dot product of smaller unitary operators. These operators correspond to 
1-qubit and 2-qubit quantum gates. For such a circuit to be physically realizable, the number of 
gates within it must be bounded above by a polynomial in the number of qubits, n. Finding such 
a factorization can be extremely challenging. For example, although there are known algebraic 
techniques for factoring an arbitrary 2 n x 2 n operator, e.g. [13], they are guaranteed to produce 
0(2 n ), i.e., exponentially many, terms in the factorization. Hence, although such a factorization 
is mathematically valid, it is physically unrealizable because, when treated as a quantum circuit 
design, would require too many quantum gates. Indeed, Knill has proved that an arbitrary unitary 
matrix will require exponentially many quantum gates if we restrict ourselves to using only gates 
that correspond to all 1-qubit rotations and XOR [14]. It is therefore clear that the key enabling 
factor for achieving an efficient quantum implementation, i.e., with a polynomial time and space 
complexity, is to exploit the specific structure of the given unitary operator. 

Perhaps the most striking example of the potential for achieving compact and efficient quan- 
tum circuits is the case of the Walsh-Hadamard transform. In quantum computing, this transform 
arises whenever a quantum register is loaded with all integers in the range to 2™ — 1. Classi- 
cally, application of the Walsh-Hadamard transform on a vector of length 2™ involves a complexity 
of 0(2 n ). Yet, by exploiting the factorization of the Walsh-Hadamard operator in terms of the 
Kroenecker product, it can implemented with a complexity of 0(1) by n identical 1-qubit quantum 
gates. Likewise, the classical FFT algorithm has been found to be implementable in a polynomial 
space and time complexity, quantum circuit [15] (see also Sec. 2.3). However, exploitation of the 
operator structure arising in the wavelet transforms (and perhaps other unitary transforms) is more 
challenging. 

A key technique, in classical computing, for exposing and exploiting specific structure of a 
given unitary transform is the use of permutation matrices. In fact, there is an extensive literature 
in classical computing on the use of permutation matrices for factorizing unitary transforms into 
simpler forms that enable efficient implementations to be devised (see, for example, [16] and [17]). 
However, the underlying assumption in using the permutation matrices in classical computation 
is that they can be implemented easily and inexpensively. Indeed, they are considered so trivial 
that the cost of their implementation is often not included in the complexity analysis. This is 
because any permutation matrix can be described by its effect on the ordering of the elements of 
a vector. Hence, it can simply be implemented by re-ordering the elements of the vector involving 
only data movement and without performing any arithmetic operations. As is shown in this paper, 
the permutation matrices also play a pivotal role in the factorization of the unitary operators that 
arise in the wavelet transforms. However, unlike the classical computing, the cost of implementation 
of the permutation matrices cannot be neglected in quantum computing. Indeed, the main issue in 
deriving feasible and efficient quantum circuits for the quantum wavelet transforms considered in 
this paper, is the design of efficient quantum circuits for certain permutation matrices. Note that, 
any permutation matrix acting on n qubits can mathematically be represented by a 2 n x 2 n unitary 
operator. Hence, it is possible to factor any permutation matrix by using general techniques such 
as [13] but this would lead to an exponential time and space complexity. However, the permutation 
matrices, due to their specific structure (i.e., sparsity pattern), represents a very special subclass of 
unitary matrices. Therefore, the key to achieve an efficient quantum implementation of permutation 
matrices is the exploitation of this specific structure. 
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In this paper, we first develop efficient quantum circuits for a set of permutation matrices arising 
in the development of the quantum wavelet transforms (and the quantum Fourier transform). 
We propose three techniques for an efficient quantum implementation of permutation matrices, 
depending on the permutation matrix considered. In the first technique, we show that a certain 
class of permutation matrices, designated as qubit permutation matrices, can directly be described 
by their effect on the ordering of qubits. This quantum description is very similar to classical 
description of the permutation matrices. We show that the Perfect Shuffle permutation matrix, 
designated as , and the Bit Reversal permutation matrix, designated as P2™ , which arise in the 
quantum wavelet and Fourier transforms (as well as in many other classical computations) belong 
to this class. We present a new gate, designated as the qubit swap gate or II4, which can be used 
to directly derive efficient quantum circuits for implementation of the qubit permutation matrices. 
Interestingly, such circuits for quantum implementation of H2™ and iV lead to new factorizations of 
these two permutation matrices which were not previously know in classical computation. A second 
technique is based on a quantum arithmetic description of permutation matrices. In particular, 
we consider the downshift permutation matrix, designated as Qi^-, which plays a major role in 
derivation of quantum wavelet transforms and also frequently arises in many classical computations 
[16]. We show that a quantum description of Q21 can be given as a quantum arithmetic operator. 
This description then allows the quantum implementation of Qi n by using the quantum arithmetic 
circuits proposed in [18]. 

A third technique is based on developing totally new factorizations of the permutation matrices. 
This technique is the most case dependent, challenging, and even counterintuitive (from a classical 
computing point of view). For this technique, we again consider the permutation matrix Q 2 n and 
we show that it can be factored in terms of FFT which then allows its implementation by using 
the circuits for QFT. More interestingly, however, we derive a recursive factorization of Q2™ which 
was not previously known in classical computation. This new factorization enables a direct and 
efficient implementation of Q2 n - Our analysis of though a limited set of permutation matrices 
reveals some of the surprises of quantum computing in contrast to classical computing. That is, 
certain operations that are hard to implement in classical computing are much easier to implement 
on quantum computing and vice versa. As a specific example, while the classical implementation 
of Il2n and i-2 n are much harder (in terms of the data movement pattern) than Q2™ , their quantum 
implementation is much easier and more straightforward than Qi n - 

Given a wavelet kernel, its application is usually performed according to the packet or pyramid 
algorithms. Efficient quantum implementation of theses two algorithms requires efficient circuits 
for operators of the form I 2 n-i (8> n 2 i and n 2 i © l^n^t f° r some i, where (8> and © designate, 
respectively, the kronecker product and the direct sum operator. We show that these operators 
can be efficiently implemented by using our proposed circuits for implementation of n 2 i. We 
then consider two representative wavelet kernels, the Haar [17] and Daubechies [19] wavelets 
which have previously been considered by Hoyer [20]. For the Haar wavelet, we show that Hoyer's 
proposed solution is incomplete since it does not lead to a gate-level circuit and, consequently, it 
does not allow the analysis of time and space complexity. We propose a scheme for design of a 
complete gate-level circuit for the Haar wavelet and analyze its time and space complexity. For the 
Daubechies wavelet, we develop three new factorizations which lead to three gate-level circuits 
for its implementation. Interestingly, one of this factorization allows efficient implementation of 
Daubechies wavelets by using the circuit for QFT. 
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2 Efficient Quantum Circuits for two Fundamental Qubits Per- 
mutation Matrices: Perfect Shuffle and Bit-Reversal 



In this section, we develop quantum circuits for two fundamental permutation matrices, the perfect 
shuffle, 112", and the bit reversal, P2 n , permutation matrices, which arise in quantum wavelet and 
Fourier transforms as well as many classical computations involving unitary transforms for signal 
and image processing [16]. For quantum computing, these two permutation matrices can directly be 
described in terms of their effect on ordering of qubits. This enables the design of efficient circuits 
for their implementation. Interestingly, these circuits lead to the discovery of new factorizations 
for these two permutation matrices. 



2.1 Perfect Shuffle Permutation Matrices 

A classical description of IT^ can be given by describing its effect on a given vector. If Z is a 2 n - 
dimensional vector, then the vector Y = Tl^nZ is obtained by splitting Z in half and then shuffling 
the top and bottom halves of the deck. Alternatively, a description of the matrix IT^, in terms of 
its elements Hy, for i and j = 0, 1, ■ ■ ■ , 2" — 1, can be given as 

f 1 if j = i/2 and i is even, or if j = (i — l)/2 + 2 n_1 and i is odd . . 

13 (0 otherwise 

As first noted by Hoyer [20] , a quantum description of can be given by 

: \a n -i a n -2 ■ ■ ■ d\ ao) i — > |ao a n -i a n -2 ■ ■ ■ ai) (2) 

That is, for quantum computation, Il2« is the operator which performs the left qubit-shift operation 
on n qubits. Note that, Il^n (t indicates the transpose) performs the right qubit-shift operation, 
i.e., 

n^n : \a n -i a n -2 • • • d\ ao) 1 — > \®n-2 ■ ■ ■ 0,1 do On-l) (3) 



2.2 Bit-Reversal Permutation Matrices 

A classical description of P2™ can be given by describing its effect on a given vector. If Z is a 
2 n -dimensional vector and Y = P2™Z, then Yj = Zj, for i = 0, 1, • • • , 2™ — 1, wherein j is obtained 
by reversing the bits in the binary representation of index i. Therefore, a description of the matrix 
P2™, in terms of its elements Py, for i and j = 0, 1, • • • , 2™ — 1, is given as 

_ j 1 if j is bit reversal of i . . 

13 j otherwise 

A factorization of P2™ in terms of n 2 ; is given as [16] 

P 2 „ = n 2 n(/2 ® n 2 n-l) • • • (I 2 i ® n 2 n- l ) • • • (i" 2 n-3 ® II 8 )(/ 2 „-2 (g) II4) (5) 

A quantum description of i"2 n is given as 

P 2 ™ : \d n -i d„- 2 , ■ ■ ■ ai d ) 1 — ► \d ai • • • a n „ 2 (6) 
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That is, P 2 n is the operator which reverses the order of n qubits. This quantum description 
can be seen from the factorization of P2", given by (5), and quantum description of permutation 
matrices n 2 ; . It is interesting to note that for classical computation the term " bit-reversal" refers 
to reversing the bits in the binary representation of index of the elements of a vector while, for 
quantum computation, the matrix P 2 n literally performs a reversal of the order of qubits. 

Note that, i-2 n is symmetric, i.e., P2 n = P\ n [16]. This can be also easily proved based on the 
quantum description of P^n- since if the qubits are reversed twice then the original ordering of the 
qubits is restored. This implies that, P 2 nP 2 n = h™ and since P 2 n is orthogonal, i.e., P 2 nP 2 t n = h n , 
it then follows that P 2 n = P\ n - 



2.3 Quantum FFT and Bit-Reversal Permutation Matrix 

Here, we review the quantum FFT algorithm since it not only arises in derivation of the quantum 
wavelet transforms (see Sec. 4.3) but also it represents a case in which the roles of permutation 
matrices II2™ and i-2 n seems to have been overlooked in quantum computing literature. 

The classical Cooley-Tukey FFT factorization for a 2 n -dimensional vector is given by [16] 

F 2 n = A n A n _i ■ ■ ■ A\P2^ = F.2 n P2 n (7) 

where A { = I 2n -i ® B 2i , B 2i = ^ ^ j and ^2<-i = Diag{l, u 2 i, u^, . . .,u 2i 

-2m i/l 1 \ 

with u 2 i =e 2' and 1 = y — 1. We have that F2 = W = I I . The operator 

F 2n = A n A n _ x ■ ■ ■ A! (8) 

represents the computational kernel of Cooley-Tukey FFT while P2" represents the permutation 
which needs to be performed on the elements of the input vector before feeding that vector into 
the computational kernel. Note that, the presence of P2 n in (7) is due to the accumulation of its 
factors, i.e., the terms (7 2 « <8> n 2 n-»), as given by (5). 

The Gentleman-Sande FFT factorization is obtained by exploiting the symmetry of F 2 n and 
transposing the Cooley-Tukey factorization [16] leading to 

F 2n = P 2 nA\ ■ ■ ■ Al^Al = P 2 nF% n (9) 

where 

Ej2 n = A\ - ■ ■ A t n _ 1 A t n (10) 

represents the computational kernel of the Gentleman-Sande FFT while P 2 n represents the per- 
mutation which needs to be performed to obtain the elements of the output vector in the correct 
order. 

In [15] a quantum circuit for the implementation of F 2 n, given by (8), is presented by developing 
a factorization of the operators B 2 i as 

B = 1 h^ 1 ^2*-! ] = J_ [ h^ 1 h*- 1 \ I h'- 1 ] s n \ 
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Let Cni = [ 1 _ ^ | . It then follows that 
^ n 2 i-i J 

B 21 = {W®I 2 i-i)C 2 i (12) 

Ai = I 2n -i ® B 2i = (I 2n -i <8> W (g> I 2 i-l)(I 2n -i <g> C 2 ») (13) 
In [15] a factorization of the operators C 2 ; is developed as 

C2* = 6n-l,n-i&n-2,n-i ' ' ' @n—i+l,n—i 

where Ojk is a two-bit gate acting on jth and fcth qubits. 

Using (13)-(14) a circuit for implementation of (8) is developed in [15] and presented in Fig. 1. 
However, there is an error in the corresponding figure in [15] since it implies that, with a correct 
ordering of the input qubits, the output qubits are obtained in a reverse order. Note that, as can 
be seen from (7), the operator F_ 2 n performs the FFT operation and provides the output qubits in 
a correct order if the input qubits are presented in a reverse order. 

The quantum circuit for Gentleman-Sande FFT can be obtained from the circuit of Fig. 1 by- 
first reversing the order of gates that build the operator block Ai (and thus building operators Aj) 
and then reversing the order of the blocks representing operators A^. By using the Gentleman- 
Sande circuit, with the input qubits in the correct order the output qubits are obtained in reverse 
order. 

For an efficient and correct implementation of the quantum FFT, one needs to take into account 
the ordering of the input and output qubits, particularly if the FFT is used as a block box in a quan- 
tum computation. If the FFT is used as a stand-alone block or as the last stage in the computation 
(and hence its output is sampled directly), then it is more efficient to use the Gentleman-Sande 
FFT since the ordering of the output qubits does not cause any problem. If the FFT is used as 
the first stage of the computation, then it is more efficient to use the Cooley-Tukey factorization 
by preparing the input qubits in a reverse order. Note that, as in classical computation, each 
or a combination of the Cooley-Tukey or Gentleman-Sande FFT factorization can be chosen in a 
given quantum computation to avoid explicit implementation of P 2 « (or, any other mechanism) 
for reversing the order of qubits and hence achieve a greater efficiency. As an example, in Sec. 

4.3 we will show that the use of the Cooley-Tukey rather than the Gentleman-Sande factorization 
leads to a greater efficiency in quantum implementation by eliminating the need for an explicit 
implementation of P 2 « (or, any other mechanism) for reversing the order of qubits. 

2.4 A Basic Quantum Gate for Efficient Implementation of Qubits Permutation 
Matrices 

If a permutation matrix can be described by its effect on the ordering of the qubits then it might 
be possible to devise circuits for its implementation directly. We call the class of such permutation 
matrices as " Qubit Permutation Matrices" . A set of efficient and practically realizable circuits for 
implementation of Qubit Permutation Matrices can be built by using a new quantum gate, called 
the qubit swap gate, II4, where 



n 4 



/ 1 
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(15) 
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For quantum computation, II4 is the "qubit swap operator", i.e., 

n 4 : \aia Q ) 1 — ► \a ai) (16) 

The II4 gate, shown in Fig. 2. a, can be implemented with three XOR (or Controlled-NOT) gates 
as shown in Fig. 2.b. The II 4 gate offers two major advantages for practical implementation: 

• It performs a local operation, i.e., swapping the two neighboring qubits. This locality can be 
advantageous in practical realizations of quantum circuits, and 

• Given the fact that II4 can be implemented using three XOR (or, Controlled-NOT) gates, 
it is possible to implement conditional operators involving II 4 , for example, operators of the 
form n 4 © l2 n -4, by using Controlled fc -NOT gates [21]. 

A circuit for implementation of II 2 n by using II4 gates is shown in Fig. 3. This circuit is based 
on an intuitively simple idea of successive swapping of the neighboring qubits, and implements Ii 2 n 
with a complexity of 0(n) by using an 0(n) number of n 4 gates. It is interesting to note that, this 
circuit leads to a new (to our knowledge) factorization of II2™ in terms of II 4 as 

n 2 « = (i 2 ™-2 ® n 4 )(/ 2 n-3 ® n 4 ® i 2 ) ■ ■ ■ (i 2 n-i ® n 4 ® i 2 i-i) ■ ■ ■ {h ® n 4 ® i 2 n-3)(n 4 ® i 2 n-i) (17) 

This new factorization of II 2 n is less efficient than other schemes (see, for example, [16]) for a 
classical implementation of 112". Interestingly, it is derived here as a result of our search for 
an efficient quantum implementation of II2™, and in this sense it is only efficient for a quantum 
implementation. Note also, that a new (to our knowledge) recursive factorization of U 2 i directly 
results from Fig. (3) as 

n 2! = (i 2 i-2 ® n 4 )(n 2 i-i ® i 2 ) (18) 

A circuit for implementation of P 2 n by using II4 gates is shown in Fig. 4. Again, this circuit 
is based on an intuitively simple idea, that is, successive and parallel swapping of the neighboring 
qubits, and implements P 2 n with a complexity of 0(n) by using 0(n 2 ) II4 gates. This circuit leads 
to a new (to our knowledge) factorization of P 2 n in terms of II4 as 

p 2 n = (( n 4 ®n 4 ---®n4 )(/ 2 ® n 4 ® • • • ® ii4 ®j 2 ))i (19) 

n n -1 

2 2 

for n even, and 

p 2 n = (( j 2 ® n 4 © • • • ® n 4 ) ( n 4 ® • • • n 4 ®/ 2 )) V ( i 2 ® n 4 ® • • • ® n 4 ) (20) 

n~l n—1 n—1 

2 2 2 

for n odd. 

It should be emphasized that this new factorization of P 2 n, is less efficient than other schemes, 
e.g., the use of (5) for a classical implementation (see also [16] for further discussion). However, 
this factorization is more efficient for a quantum implementation of P 2 n. In fact, a quantum im- 
plementation of P 2 n by using (5) and (17) will result in a complexity of 0(n 2 ) by using 0(n 2 ) H4 
gates. 

As will be shown, the development of complete and efficient circuits for implementation of 
wavelet transforms requires a mechanism for implementation of conditional operators of the forms 

IT2* ® I 2 n 2* and P 2 i © I 2 n 2m for some i. The key enabling factor for a successful implementation of 

such conditional operators is the use of factorizations similar to (17) and (19)-(20) or, alternatively, 
circuits similar to those in Figures 3 and 4, along with the conditional operators involving II4 gates. 
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3 Quantum Wavelet Algorithms 



3.1 Wavelet Pyramidal and Packet Algorithms 

Given a wavelet kernel, its corresponding wavelet transform is usually performed according to a 
packet algorithm (PAA) or a pyramid algorithm (PYA). The first step in devising quantum coun- 
terparts of these algorithms is the development of suitable factorizations. Consider the Daubechies 
fourth-order wavelet kernel of dimension 2* , denoted as . First level factorizations of PAA and 
PYA for a 2 n -dimensional vector are given as 

PAA = (I 2 n-2 (g> Z^ 4) )(/ 2 n-3 <g> II 8 ) ■ • ■ (I 2 u-i © D^)(I 2 n-i-l © U 2 i+l) ■ ■ ■ (I 2 <g> D^^U^D^ (21) 

pya = (Df © / 2 n_ 4 )(n 8 e i 2 n_ 8 ) ■ ■ ■ (d^ © i 2n _ 2i )(u 2i+ i © / 2 n_ 2 *+i) • • • u^d^ (22) 

These factorizations allow a first level analysis of the feasibility and efficiency of quantum im- 
plementations of the packet and pyramid algorithms. To see this, suppose we have a practically 
realizable and efficient, i.e., 0(i), quantum algorithm for implementation of D ^ . For the packet 
algorithm, the operators (I 2 n-i © D^) can be directly and efficiently implemented by using the 

algorithm for . Also, using the factorization of n 2 i, given by (17), the operators {I 2 n-i <8> n 2 <) 
can be implemented efficiently in 0(i). 

For the pyramid algorithm, the existence of an algorithm for does not automatically imply 

an efficient algorithm for implementation of the conditional operators (D^ © I 2 n_ 2 i). An example 
of such a case is discussed in Sec. 4.4. Thus, careful analysis is needed to establish both the 
feasibility and efficiency of implementation of the conditional operators ffi/ 2 n_ 2 ;) by using the 

algorithm for D$ ■ Note, however, that the conditional operators (n 2 » © ^2™-2 i ) can be efficiently 
implemented in 0(i) by using the factorization in (17) and the conditional II4 gates. 

The above analysis can be extended to any wavelet kernel (WK) and summarized as follows: 

• Packet algorithm: A physically realizable and efficient algorithm for the WK along with 
the use of (17) leads to a physically realizable and efficient implementation of the packet 
algorithm. 

• Pyramid algorithm: A physically realizable and efficient algorithm for the WK does not 
automatically lead to an implementation of the conditional operators involving WK (and 
hence the pyramid algorithm) but the conditional operators (II 2 i © I 2 n_ 2 i) can be efficiently 
implemented by using the factorization in (17) and the conditional H4 gates. 

3.2 Haar Wavelet Factorization and Implementation 

The Haar transform can be defined from the Haar functions [17]. Hoyer [20] used a recursive 
definition of Haar matrices based on the generalized Kronecker product (see also [17] for similar 
definitions) and developed a factorization of H 2 n as 

H 2 n = (I 2 n-1 (g> W) ■ ■ ■ (I 2 n-i © W © I 2 n_ 2 n- l+ l) ■ ■ ■ (W © -2) X 

(n 4 © / 2 n- 4 ) • • • (n 2! © i 2n _ 2 i) • • • (n )n 2 n (23) 
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Hoyer's circuit for implementation of (23) is shown in Fig 5. However, this represents an incomplete 
solution for quantum implementation and subsequent complexity analysis of the Haar transform. 
To see this, let 



o-(l) 
n 2 n 



W) ■ ■ ■ (I 2 n-i ®W®I 2 n o I 



(24) 



(2) 



j 2 n - (n 4 © h^-i) ■ ■ ■ (n 2 * © i 2 n- 2 i) ■ ■ ■ (n )n 2 n (25) 

Clearly, the operator i? 2 ^ can be implemented in 0(n) by using 0(n) conditional W gates. But the 

i 2 \ 

feasibility of practical implementation of the operator H 2 „ and its complexity (and consequently 
those of the factorization in (23)) cannot be assessed unless a mechanism for implementation of the 
terms (H 2 i © I 2 n„ 2 i) is devised. 

However, by using the factorizations and circuits similar to (17) and Figure 3, it can be easily 
shown that the operators (n 2 i © l 2 n_ 2 i) can be implemented in Oil) by using 0(i) conditional H4 

h (2) 

gates (or, Controlled -NOT gates). This leads to the implementation of H 2 „ and consequently H 2 n 
in 0(n 2 ) by using 0(n 2 ) gates. This represents not only the first practically feasible quantum circuit 
for implementation of H 2 n but also the first complete analysis of complexity of its time and space 
(gates) quantum implementation. Note that, both operators (I 2n -i®H 2 i) and (H 2 i®I 2n _ 2 i) can be 
directly and efficiently implemented by using the above algorithm and circuit for implementation of 
H 2 i. This implies both the feasibility and efficiency of the quantum implementation of the packet 
and pyramid algorithms by using our factorization for Haar wavelet kernel. 

3.3 Daubechies Wavelet and Hoyer's Factorization 

The Daubechies fourth-order wavelet kernel of dimension 2 n is given in a matrix form as [22] 



/ c 

C3 



r>(4) 



Cl c 2 

-C2 Cl 
CO 
C3 



C3 

-co 

Cl 
-C2 



C2 
Cl 



C3 

-co 



CO 
C3 



Cl 
-C2 



C2 
V ci 



C3 

-co 



C2 
Cl 
CO 
C3 



C3 

-co 

Cl 
-C2 / 



(26) 



where Co = , c\ = 1 c 2 = 1 an d c 3 = ^ 4 ^^ ■ For classical computation and 

given its sparse structure, the application of can be performed with an optimal cost of 0(2 n ). 
However, the matrix -D 2 t\ as given by (26), is not suitable for a quantum implementation. To 
achieve a feasible and efficient quantum implementation, a suitable factorization of D^} needs to 
be developed. Hoyer [20] proposed a factorization of Dr£) as 

= (I 2 n-1 <g) Cl)S 2n (I 2n -l 

where 

l( C4 ~ C2 ) andCi = - 
V -C2 c 4 / 2 



Co) 




(27) 
(28) 



10 



and S 2 « is a permutation matrix with a classical description given by 



_ I 1 if i = j and i is even, or if i + 2 = j (mod 2 n ) . . 

* J 1 otherwise 

Hoyer's block-level circuit for implementation of (27) is shown in Figure 6. Clearly, the main issue 
for a practical quantum implementation and subsequent complexity analysis of (27) is the quantum 
implementation of matrix S 2 n ■ To this end, Hoyer discovered a quantum arithmetic description of 

S 2 n as 

£2™ : \a n -i a n -2 ■ ■ ■ «i «o) 1 — ► |&n-i ■■■h b ) (30) 



where 



aj — 2 (mod n), if i is odd . . 

di otherwise 



As suggested by Hoyer, this description of S2" then allows its quantum implementation by using 
quantum arithmetic circuits of [18] with a complexity of 0(n). This algorithm can be directly 
extended for implementation of the operators (I 2 n-i ®D^) and hence the packet algorithm. How- 
ever, the feasibility and efficiency of an implementation of the operators (/2 11 -* © D^}) and thus 
the pyramid algorithm needs further analysis. 



4 Fast Quantum Algorithms and Circuits for Implementation of 
Daubechies Wavelet 

In this section, we develop a new factorization of the Daubechies wavelet. This factorization 
leads to three new and efficient circuits, including one using the circuit for QFT, for implementation 
of Daubechies wavelet. 



4.1 A New Factorization of Daubechies Wavelet 

We develop a new factorization of the Daubechies wavelet transform by showing that the 
permutation matrix SV can be written as a product of two permutation matrices as 



5*2" — Q 2 n R2 n 

where Q 2 ™ is the downshift permutation matrix [16] given by 



(32) 



Q2 n — 



( 1 
1 
000 

••• 
V 1 ••• 



1 
000/ 



(33) 



11 



and i?2™ is a permutation matrix given by 



R 2 n — 



/0 1 

1 











1 







1 




1 

1 o J 



(34) 



The matrix R 2 n can be written as 



R 2 „ = / 2 „_i ® N 



(35) 



where N 
as 

where 



1 

1 



. Substituting (35) and (32) into (27), a new factorization of D^n is derived 



D$ = (i" 2 n-l (8) Cl)Q 2 n(I 2 n-l N)(I 2 n-l <g> C ) = (I 2 n-1 <g> C 1 )Q 2 n(I 2 n-l <g> Cq) 



C = N.C = 2 



-c 2 c 4 



c 4 



-C2 



(36) 



(37) 



Fig. 7 shows a block-level implementation of (36). Clearly, the main issue for a practical quantum 
gate-level implementation and subsequent complexity analysis of (36) is the quantum implemen- 
tation of matrix Q 2 n. In the following, we present three circuits for quantum implementation of 
matrix Q 2 n. 



4.2 Quantum Arithmetic Implementation of Permutation Matrix Q 



2" 



A first circuit for implementation of matrix Q 2 n is developed based on its description as a quantum 
arithmetic operator. We have discovered such a quantum arithmetic description of Q 2 n as 



where 



Q 2 n : \a n -i a n _2 ■ • • «i do) 1 — ► \b n -i b n -2 ■ • • 6i bo) 



en — 1 (mod n) 



(38) 



(39) 



This description of Q 2 n allows its quantum implementation by using quantum arithmetic circuit of 
[18] with a complexity of 0(n). Note, however, that the arithmetic description of Q 2 ™ is simpler 
than that of S 2 n since it does not involve conditional quantum arithmetic operations (i.e., the same 
operation is applied to all qubits). This algorithm for quantum implementation of Q 2 n and hence 



0$ can be directly extended for implementation of the operators (I 2 n-i (8> D^) and hence the 
packet algorithm. However, the feasibility and efficiency of an implementation of the operators 



(4), 



D^) and thus the pyramid algorithm needs further analysis. 
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4.3 Quantum FFT Factorization of Permutation Matrix Q 2 n 

A direct and efficient factorization and subsequent circuit for implementation of Q 2 n (and hence 
Daubechies wavelet) can be derived by using the FFT algorithm. This factorization is based 
on the observation that Q 2 n can be described in terms of FFT as [16] 

Q 2 n = i 7 2™T2ni ? 2n (40) 

where T 2 n is a diagonal matrix given as T 2 n = Diagjl, uj 2 n, oj 2 n, ...,ui 2 „ } with u 2 n = e~^~ 
(* indicates conjugate transpose). As will be seen, it is more efficient to use the Cooley-Tukey 
factorization, given by (7), and write (40) as 

Q 2 n = F_ 2 n P 2 nT 2 n P 2 n F^n (41) 

It can be shown that the matrix T 2 n has a factorization as 

T 2n = (G(uj^ n ) ® I 2 n—1 )■■■ (I 2 i-i <8> G(uj 2 (J 2 „-i®G(w2n)) (42) 

where G(u; 2n ) = Diagjl, u> 2 n} = I & ). This factorization leads to an efficient implementa- 

y U L0 2 n J 

tion of T 2 n by using n single qubit G(uj 2 n) gates as shown in Fig. 8. Together with the circuit for 
implementation of P 2 n (Fig. 4) and the circuit for implementation of FFT (Fig. 1), they represent 

a complete gate-level implementation of D^n ■ 

However, a more efficient circuit can be derived by avoiding the explicit implementation of P 2 n 
by showing that the operator 

P 2 n T 2 n P 2 n = P 2 n (G(u! 2 n ) (8> I 2 n-l) ■■ ■ (I 2 i-1 <8> G(L0 2 n ) (8> I 2 n-i ) • • • ij 2 n- 1 % G(u> 2 n ) )P 2 n (43) 

can be efficiently implemented by simply reversing the order of gates in Fig. 8. This is established 
by the following lemma: 

Lemma 1. 

P 2n (G(uJ 2 1 ^ 1 ) (g) I 2 n-l) = (I 2 n-1 <g> G(LU 2 \~ 1 ))P 2 n (44) 

P 2 n (I 2 „-j (g) G{ljJ 2 ^ ) ® I 2 j-l) = (I 2 j-1 (8) G^^ 1 ) (8) I 2 n-j)P 2 n (45) 

P 2 n(I 2 n-l (g) G(tjJ 2 n)) = (G(cJ 2 n) ® I 2 n-l)P 2 n (46) 

Proof. This lemma can be easily proved based on the physical interpretation of operations in (44)- 
(46). The left-hand side of (44) implies first an operation, i.e., application of G(u> 2 2 ), on the last 
qubit and then application of P 2 n on all the qubits, i.e., reversing the order of qubits. However, this 
is equivalent to first reversing the order of qubits, i.e., applying P 2 n, and then applying Giuj^ ), 
on the first qubit which is the operation described by the right-hand side of (44). Similarly, the 
left-hand side of (45) implies first application of Giu^n ) on the (n — i)th qubit and then reversing 
the order of qubits. This is equivalent to first reversing the order of qubits and then applying 
G{uj%t, ) on the ith. qubit which is the operations described by the right hand side of (45). In a 
same fashion, the left hand side of (46) implies first application of G(uj 2 n) on the first qubit and 
then reversing the order of qubits which is equivalent to first reversing the order of qubits and then 
applying G(to 2 n ) on the last qubit, that is, the operations in right-hand side of (46). 
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Applying (44)-(46) to (43) from left to right and noting that, due to the symmetry of P 2 ™, we 
have P 2 it then follows that 



P~2 n T<2 n P'2 n — (I 2 n-i © G(uj 2 " ))■■■ (I 2 n-i © G{Jf n ') © l^-i) • • • (G(UJ2") © ^2«-i) (47) 

The circuit for implementation of (47) is shown in Fig. 9 which, as can be seen, has been obtained 
by reversing the order of gates in Fig. 8. Note that, the use of (47), which is a direct consequence 
of using the Cooley-Tukey factorization, enables the implementation of (40) without explicit im- 
plementation of P 2 n. 

Using (40) and (47), the complexity of the implementation of Q 2 n and thus is the same 
as of the quantum FFT, that is, 0(n 2 ) for an exact implementation and 0{nm) for an approx- 
imation of order m [15]. Note that, by using (47), (40), and (36) both operators (J 2 n-» © D^) 

and {D^i © I 2 n_ 2 i) can be directly implemented. This implies both the feasibility and efficiency 
of the quantum implementation of the packet and pyramid algorithms by using this algorithm for 
quantum implementation of D^) ■ 

4.4 A Direct Recursive Factorization of Permutation Matrix Q 2 n 

A new direct and recursive factorization of Q 2 n can be derived based on a similarity transformation 
of Q 2 n by using 112" as 

I 2 n-\ 



n' 2 -%-n 2 . = 1 Y (4 8) 



which can be written as 



n*„Q2«n 2 n = y ^ ^ j y Q ^ ^ j = (NQIv-iXQ^-iQIv-i) (49) 

from which Q 2 n can be calculated as 

Q 2 u = n 2 n(AT(g)7 2 „-i)(Q )n* 2 n (50) 

Replacing a similar factorization of Q 2n -i into (50), we get 

Q 2 „ = n 2 n(iV © 7 2 „-l)(n 2 n-l (N © I 2 n-2)(Q )n 2 „_! ©/ 2 „-i)n 2 „ ( 51 ) 

By using the identity 

n 2 n-iAn 2 „_i © / 2 n-i = (/ 2 © n 2 n-i)(A © / 2 n-i)(/ 2 © n* n _i) (52) 

for any matrix Ae^R 2 " lx2 ™ 1 , (51) can be then written as 

Q 2 n = n 2 n(AT © I 2 „-l)(I 2 © II 2 n-l)((./V © 7 2 n-2 ) (Q 2 n-2 © I 2 n-2 ) © / 2 n-i)(/ 2 © n*„_i)n|„ (53) 
Using the identity 

(iV © I 2 n-2)(Q 2 n-2 © 7 2 n- 2 ) © i" 2 n-l = (iV © I 2 n-2 © I 2 n-1 ) (Q 2 n-2 © I 2 n-2 © 7 2 n-l) 

= (iV © I 2 n-2 © 7 2 n-l ) (Q 2 n-2 © / 3 . 2 n-2 ) (54) 
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(53) is now written as 

Q 2 n =n 2 n(iV<g)/ 2n -l)(/ 2 ®n 2 „-l)(Ar®/ 2 n-2 ©i" 2 „-l)(Q 

2 )(j 2 ® 14.-1)14. (55) 

Repeating the same procedures for all Q 2 i, for i = n — 3 to 1, and noting that Q 2 = N, it then 
follows 

Q 2n = II 2 n(iV (g) 7 2 „-l)(7 2 © n 2 n-l)(JV (8) 7 2 n-2 © 7" 2 n-l)(i" 4 © II 2 „-2)(JV (g) 7 2 n-3 © I 2 n_ 2 n- 2 ) ■ ■ ■ 

(7 2 n- 2 © n 4 )(7V © 7 2 © 7 2 n_ 4 )(iV © 7 2 n-2)(7 2 n-2 © n£) • ■ ■ (J 2 © l4,-i)l4« (56) 

The above expression of Q 2 ™ can be further simplified by exploiting the fact that (see Appendix 
for the proof) every operator of the form (7 2 i © Jl 2 n-i), for i = n — 2 to 1, commutes with all 
operators of the form (N © I 2 n-j © I 2 n_ 2 n-j+i), for j = i to 1. Using this commutative property, 
(56) can be now written as 

Q 2 n = n 2 n(/ 2 © n 2 n-l)(/ 4 © U 2 n-2 ) ■ ■ ■ (I 2 n-2 © II 4 ) (N © I 2 n-l){N © I 2 n-2 © I 2 n-l) ■ ■ ■ 

(N © I 2 © I 2 n„ 4 )(Af © I 2 n_ 2 )(I 2 „-2 © n|) • • • (J 2 © n^.^n^ (57) 

Using the factorization of P 2 « given in (5), we then have 

Q 2 n = P 2 n(N © I 2 n-l)(N © I 2 ) ■ ■ ■ (N © I 2 © 7 2 n_ 4 )(iV © I 2 n_ 2 )P 2 n (58) 

Substituting (58) into (36), a factorization of D 2 t"* is then obtained as 

£>$ = (/ 2 „-i©Ci)P 2 n(iV©/ 2 „_ 1 )(Ar©/ 2 n-2 © I 2 n-1 ) " " " (-/V © 7 2 © 7 2 n_ 4 )(-/V © 7 2 n_ 2 )P 2 n (I 2 n-1 ©Cq) 

(59) 

Using Lemma 1, it then follows that 

£>$ = P 2 n(Cl©/ 2 n-l)(iV©/ 2 n-l)(iV©/ 2 n-2 ©7 2 n-l) • • • (iV © 7 2 © 7 2 n_ 4 )(A f © 7 2 n_ 2 ) (Cg © 7 2 n- 1 )P 2 ™ 

' ' ' (60) 

A circuit for implementation of -D 2 t\ based on (60), is shown in Fig. 10. Together with the 
circuit for implementation of P 2 n , shown in Fig. 4, they represent a complete gate-level circuit for 
implementation of with an optimal complexity of 0(n). 

Using (60) and (19)-(20), the operators (I 2 n-i ©D^) can be directly and efficiently implemented 
with a complexity of 0{i). This implies both the feasibility and efficiency of the implementation of 
the packet algorithm by using this algorithm for D^n wavelet kernel. However, this algorithm is less 
efficient for implementation of the operators (D^ © 7 2 n_ 2 ;) and hence the pyramid algorithm. To 

see this, note that, the implementation of the operators (D^ ffii" 2 n_ 2 ;), by using (60), requires the 
implementation of the conditional operators (P 2 i ©7 2 n_ 2 i). However, these conditional operators 
cannot be directly implemented by using (19) and (20). An alternative solution is to use the 
factorization of P 2 % in (5) and the conditional operators (n 2 ; © J 2 «-2*)- However, this leads to a 
complexity of 0{i 2 ) for implementation of operators (P 2 « ©7 2 n_ 2 i) and hence the operators © 

I 2 n_ 2 i). Therefore, while (60) is optimal for implementation of and the packet algorithm, it 
is not efficient for implementation of the pyramid algorithm. 

It should be emphasized that this recursive factorization of Q 2 n, originated by the similarity 
transformation in (48) and given by (56) and (58), was not previously known in classical computing. 
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Note that, the permutation matrices II2™ and, particularly, i-2 n are much harder (in terms of data 
movement pattern) for a classical implementation than Q2 n - in this sense, such a factorization of 
Q2™ is rather counterintuitive from a classical computing point of view since it involves the use of 
permutation matrices 112" and P2 n and thus it is highly inefficient for a classical implementation. 

5 Discussion and Conclusion 

In this paper, we developed fast algorithms and efficient circuits for quantum wavelet transforms. 
Assuming an efficient quantum circuit for a given wavelet kernel and starting with a high level 
description of the packet and pyramid algorithms, we analyzed the feasibility and efficiency of the 
implementation of the packet and pyramid algorithms by using the given wavelet kernel. We also 
developed efficient and complete gate-level circuits for two representative wavelet kernels, the Haar 
and Daubechies kernels. We gave the first complete time and space complexity analysis of the 
quantum Haar wavelet transform. We also described three complete circuits for Daubechies 
wavelet kernel. In particular, we showed that Daubechies kernel can be implemented by using 
the circuit for QFT. Given the problem of decoherence, exploitation of parallelism in quantum 
computation is a key issue in practical implementation of a given computation. To this end, we are 
currently analyzing the algorithms of this paper in terms of their parallel efficiency and developing 
more efficient parallel quantum wavelet algorithms. 

As shown in this paper, permutation matrices play a pivotal role in the development of quantum 
wavelet transforms. In fact, not only they arise explicitly in the packet and pyramid algorithms 
but also they play a key role in factorization of wavelet kernels. For classical computing, the 
implementation of permutation matrices is trivial. However, for quantum computing, it represents 
a challenging task and demands new, unconventional, and even counterintuitive (from a classical 
computing view point) techniques. For example, note that most of the factorizations developed in 
paper for permutation matrices H2" , -P2™ , and Q2™ were not previously known in classical computing 
and, in fact, they are not at all efficient for a classical implementation. Also, implementation of the 
permutation matrices reveals some of the surprises of quantum computing in contrast to classical 
computing. In the sense that, certain operations that are hard to implement in classical computing 
are easier to implement in quantum computing and vice versa. As a concrete example, note that 
while the classical implementation of permutation matrices H2™ and (particularly) P2™ is much 
harder (in terms of data movement pattern) than the permutation matrix Q2 n , their quantum 
implementation is much easier and more straightforward than Q2"- 

In this paper, we focussed on the set of permutation matrices arising in the development of 
quantum wavelet transforms and analyzed three techniques for their quantum implementation. 
However, it is clear that the permutation matrices will also play a major role in deriving compact 
and efficient factorizations, i.e., with polynomial time and space complexity, for other unitary 
operators by exposing and exploiting their specific structure. Therefore, we believe strongly that 
a more systematic study of permutation matrices is needed in order to develop further insight into 
efficient techniques for their implementation in quantum circuits. Such a study might eventually 
lead to the discovery of new and more efficient approaches for the implementation of unitary 
transformations and therefore quantum computation. 



16 



Acknowledgement 

The research described in this paper was performed at the Jet Propulsion Laboratory (JPL), 
California Institute of Technology, under contract with National Aeronautics and Space Adminis- 
tration (NASA). This work was supported by the NASA/ JPL Center for Integrated Space Microsys- 
tems (CISM), NASA/JPL Advanced Concepts Office, and NASA/JPL Autonomy and Information 
Technology Management Program. 

Appendix: Commutation of the Operators I 2 i © II 2 »-i with N <g> J 2 «-j © ^2™— 2 n -J+ 1 

We first prove that every operator of the form I 2 i ® II 2 n-i, for i = n — 2 to 1, commutes with 
all the operators of the form N © I^n-j © I 2 n_ 2n -j+\ , for j = i to 2, by simply showing that 

(I 2 i iT 2 n— z)(iV © I 2 n—j © I 2 n — 2(n— j +1 ) = ("^ © ^2 n— -? © ^2 n — 2 n— 

)(I 2 i ®II 2 n-i) (61) 

The matrix I 2 » © n 2 «-i is a block diagonal matrix and therefore can be written as 

I 2 i © II 2 n-i = I2 © II 2 n-j © ^2J_2 © n 2 n-i (62) 

It can be then shown that 

(j 2 © n 2 n-j © ^2J_2 © n 2 n-j)(A^ © ^"-^ © ^2™-2"-j+ i ) = n ® n 2 n~j © / 2 j_2 © n 2 „-i (63) 

and 

(N © I 2 n-j © /2"-2™-J + 1 )(-^2 © n 2 n-j © /2J_2 © R- 2 n-j ) = N <g) H 2 n-j © ^2J_2 © H^n-j (64) 

It now remains to show that every operator of the form I 2 i © II 2 n-i commutes with the operator 
N © J 2 n-i. This is simply proved by first using the fact that 

I 2 i © il 2 n-i = 12 © (i^- 1 © IJ 2 n-i) (65) 

and then showing that 

(J 2 © (J 2 i-1 © II 2 n-i))(iV © I 2 n-l) = (N © I 2 n-l)(J 2 © (I 2 i-1 © II 2 „-i)) = iV © J 2 i-1 © n 2 n-i (66) 
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Figure 1: A circuit for implementation of quantum Fourier transform, QFT (from [15]). 
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Figure 2: The II4 gate (a) and its implementation by using three XOR (Controlled-NOT) gates 
(b). 
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Figure 3: A circuit for implementation of Perfect Shuffle permutation matrix, 112" 
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(a) 




(b) 



Figure 4: Circuits for implementation of Bit Reversal permutation matrix, P2™ , for n even (a) and 
for n odd (b). 
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Figure 5: A block-level circuit for Haar wavelet (from [20]). 
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Figure 6: A block-level circuit for implementation of Hoyer's factorization of 
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Figure 7: A block-level circuit for implementation of new factorization of D. 
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Figure 8: A circuit for implementation of operator T^^ ■ 
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Figure 9: A circuit for implementation of operator i^^V-fV 
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Figure 10: A circuit for implementation of D 2 n by using recursive factorization of Q2 n - 
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